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REVIEWS 


HORIZONTAL GENE TRANSFER, 
GENOME INNOVATION AND 
EVOLUTION 


/. Peter Gogarten and Jeffrey P. Townsend 

Abstract | To what extent is the tree of life the best representation of the evolutionary history of 
microorganisms? Recent work has shown that, among sets of prokaryotic genomes in which 
most homologous genes show extremely low sequence divergence, gene content can vary 
enormously, implying that those genes that are variably present or absent are frequently 
horizontally transferred. Traditionally, successful horizontal gene transfer was assumed to 
provide a selective advantage to either the host or the gene itself, but could horizontally 
transferred genes be neutral or nearly neutral? We suggest that for many prokaryotes, the 
boundaries between species are fuzzy, and therefore the principles of population genetics 
must be broadened so that they can be applied to higher taxonomic categories. 


TREE OF LIFE 

The tree-like representation of 
the history of all living and 
extinct organisms. 

MUTUALISM 

An association between two 
organisms, often from different 
species, that benefits both 
partners. 

RETICULATION 
A network that is formed 
through the fusion of 
independent branches. 
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Bifurcating trees, in which evolutionary lineages 
split and evolve independently from each other, have 
a long history as tools to visualize the evolution of 
species: Lamarck introduced tree-like binary schemes 
for taxonomic classification 1 and Charles Darwin 
described the evolution of species as the tree of life 2 . 
Darwin also noted that the ‘coral of life’ might be a 
more appropriate metaphor, because only the outer¬ 
most layer in the tree of life is actually alive, resting 
on a base of dead branches 3 . The tree of life became 
the standard imagery to depict species evolution, 
implying a common root of all life on Earth and a 
bifurcating evolutionary process. However, there are 
several clear exceptions to this standard view. For 
example, botanists found that many plant species 
violate a bifurcating model as they are allopolyploid, 
combining the genomes of different parental spe¬ 
cies. This process of a new line of descent originat¬ 
ing from the hybridization of two parent species has 
been termed reticulate evolution 4-8 . The fungal-algal 
symbiosis of lichens illustrates that symbiosis can 
lead to long-term partnerships with different prop¬ 
erties from those of either parent species, and some 
of the most dramatic breakthroughs in cellular 


evolution, that is, the mitochondria and plastids, 
are the result of endosymbiosis 9 . Throughout the 
decades, mutualism and reticulation have often been 
considered the most important processes in species 
evolution 10,11 . However, for most branches of biology 
these processes were only exceptions, albeit important 
ones, in an otherwise steadily furcating process of 
species evolution. 

By introducing ribosomal RNA (rRNA) as a taxo¬ 
nomic marker molecule, Woese and Fox extended the 
tree paradigm to the realm of microorganisms 12,13 . 
However, the large-scale availability of sequence data 
provided information that effectively sundered the 
cambium of the tree of life metaphor. Different mol¬ 
ecules were shown to have different histories 14 , and 
members of the same species were found to differ dra¬ 
matically in gene content. For example, of the genes 
revealed by the sequencing of three Escherichia coli 
genomes, fewer than 40% were common to all three 15 . 
Furthermore, it has been suggested that extinct species 
have contributed genes to the extant layer of life, even 
though these contributors might not have been in the 
direct line of ancestry 16,17 . Reticulate models of evolu¬ 
tionary history that incorporate gene transfer might 
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PHYLOGENY 

The origin and evolution of a 
group of organisms, usually of 
species. Phylogenies are not 
necessarily tree-like. The term 
phylogeny is frequently applied 
to genes and different levels of 
taxonomic units. These are 
often labelled as operational 
taxonomic units. 

GENETREE 

A depiction of the history of 
families of homologous genes. 
Intragenic recombination can 
gives rise to non-tree-like gene 
phylogenies. 

HOMOLOGUES 
Characters or sequences that 
are derived from the same 
ancestral feature. 


even provide an opportunity to learn about organisms 
that became extinct hundreds of million years ago. 

It is now known that organismal mutualisms and 
lineage reticulation are supplemented by horizontal 
gene transfer (HGT) as processes that lead to the 
network-like histories of living organisms. Over short 
time intervals, an organismal line of descent could 
be defined as a ‘plurality consensus’ of gene histories. 
However, these organismal lines of descent are embed¬ 
ded in a web of gene phylogenies that form connections 
between the different branches of the tree of life 17,18 . It 
now appears that all functional categories of genes are 
susceptible to HGT, even rRNA operons 19 and genes 
associated with phylum-defining characteristics, such 
as the photosynthetic machinery 20 . However, not all 
genes are equally itinerant. Some clearly have a higher 
propensity for transfer than others 21 , and not all groups 
of organisms experience HGT to the same extent 22,23 . 

Patterns from HGT versus shared ancestry 

One of the predicted outcomes of high levels of HGT 
between preferred partners is the observation of robust 
gene trees, the implications of which are indistinguish¬ 
able from the signals produced by recent shared 
ancestry 19,24 . In particular, two predictions are worth 
considering: organisms that frequently give or receive 
genes from sister taxa will group together in most gene 
phylogenies (that is, the phylogenies of the transferred 


genes), and organisms not participating in HGT with 
sister taxa should be left basal’ by those that are, and 
these non-participatory lineages should be recovered 
as deep branching lineages. 

The Thermotogales provide an interesting illustra¬ 
tion of this point. This group of extreme thermophiles 
is recovered as a deep branching lineage of Bacteria 
when using individual gene phylogenies 13,25 as well as 
using whole-genome-based analyses (see ref. 26 for 
a recent review). When the genome of Thermotoga 
maritima was sequenced, >20% of the open reading 
frames were reported to be most similar to homologues 
from the Archaea 27 . The Thermotogales share their 
environment mainly with Archaea; FIG. la illustrates 
that many of the proteins encoded in the T. maritima 
genome are less divergent from their archaeal homo¬ 
logues than is found for other Bacteria with compara¬ 
bly sized genomes, such as Streptococcus thermophilus. 
Proteins with slight divergence, which places them 
in the left-hand tail of the T. maritima distribution, 
presumably correspond to those that have been hori¬ 
zontally transferred from archaeal taxa at some time 
subsequent to the Archaea-Bacteria split. Imposing a 
rate of HGT that results in a lower divergence for 3% 
of loci to the actual S. thermophilus gene-divergence 
diagram illustrates a left-hand tail to the distribution 
of gene divergences (FIG. lb) similar to that observed for 
T. maritima (FIG. la). 
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Figure 1 | The effects of gene transfer on sequence divergence, a | Histogram showing the number of encoded proteins 
with different levels of relative sequence divergence. The divergence from the most similar euryarchaeal homologue was 
calculated for all annotated open reading frames of a Gram-positive bacterium with little interdomain transfer (Streptococcus 
thermophilus LMG 18311, red) and for the extremely thermophilic bacterium Thermotoga maritima (blue). For each encoded 
protein, BLAST searches were carried out against the proteins in five archaeal genomes ( Pyrococcus abyssi, Pyrococcus 
furiosus, Archaeoglobus fulgidus, Methanocaidococcus jannaschii and Methanothermobacter thermautotrophicus). The 
S. thermophilus genome encodes 1,889 currently annotated open reading frames, 851 of which have a significant match in at 
least one of the euryarchaeal genomes (E value <10 -3 ); the T. maritima genome encodes 1,858 proteins, 1,193 of which have a 
significant match in at least one of the archaeal genomes. The bitscore divided by the alignment lengths was used as a measure 
of sequence similarity. Relative sequence divergence between two sequences was calculated as (1-similarity(b_a)/similarity(b_b)), 
where similarity(b_a) is the similarity score for a bacterial sequence with the most similar archaeal sequence, and similarity(b_b) is 
the similarity score of the bacterial sequence compared with itself. Two thirds of those T. maritima genes with <45% sequence 
divergence are classified as encoding genes that fall into the metabolism category in the COG (clusters of orthologous groups) 
database 100,101 , whereas only a third of all T. maritima genes fall into this category. Note that the tail in the distribution, owing to 
the presence of sequences with little divergence, is absent in the case of S. thermophilus, suggesting transfer into the 
Thermotoga lineage as the most probable explanation, b | An example of the effect of horizontal gene transfer (HGT) on the 
distribution of the divergence of genes. In red, the distribution for the percentage amino-acid divergence of genes of 
S. thermophilus. In blue, the expected distribution of divergences of genes given the same gene-specific divergence rates from 
a donor taxon, but in addition incorporating a low rate of gene transfer from the donor taxon, affecting 3% of the genes. Details 
of how the expectations were calculated can be obtained from the authors. This calculation is meant to be merely illustrative; the 
development of precise quantitative methods for testing well specified models of HGT is vital to future studies of particular taxa. 
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Figure 2 | The tree of life. A sketch of the tree of life as it is frequently derived from genome 
data (for example, REF. 26), with the three possible positions of Thermotoga maritima marked 
according to (a) ‘concordant’ genes (placed with the Gram-positives), (b) 16S RNA (and other 
conserved genes) and whole-genome analyses (placed as an early diverging lineage) and (c) 
phylogenetically discordant genes (placed with the Pyrococci among the Archaea). For further 
discussion see ref. 28 and text. 


SUPERTREES 

Trees calculated from smaller 
trees with sets of overlapping 
operational taxonomic units. 

ORTHOLOGUES 
Homologues that are related to 
each other through a speciation 
event. 

PURIFYING SELECTION 
Kimura’s neutral theory of 
molecular evolution posits that 
most variations observed at the 
molecular level do not provide a 
selective advantage or 
disadvantage. However, many 
nucleotide mutations are never 
observed in a population 
because they are associated with 
a strong selective disadvantage. 
This is the case for mutations 
that change a catalytically 
important amino acid. The 
selection that prevents these 
detrimental mutations from 
becoming fixed in a population 
is known as purifying selection. 


Recently, Gophna et al. 2S constructed separate 
genome-based trees from different sets of genes: those 
that are frequently found to be in phylogenetic agree¬ 
ment with one another and those that are frequently 
found to be phylogenetically discordant. They found 
that the phylogenetically discordant genes group 
T. maritima among the Archaea as a sister group to the 
Pyrococci. By contrast, the concordant genes group 
T. maritima within the Bacteria at the base of the Gram¬ 
positive Bacteria (FIG. 2 ). A phylogenetic affiliation for 
T. maritima similar to that for concordant genes was 
recovered by Daubin and colleagues 29 using a supertree 
approach on stringently selected sets of orthologous 
genes. According to phylogenetic analysis of infre¬ 
quently horizontally transferred genes, T. maritima 
seems to be Gram-positive, but frequently transferred 
genes group it inside the Archaea domain. However, 
if all genes are considered, T. maritima is recovered as 
a deep branching lineage of Bacteria. These findings 
indicate that the deep branching bacterial phylogenetic 
position is an artefact, resulting from analyses that 
combine genes with different phylogenetic histories. 

Surprisingly, T. maritima is also frequently recov¬ 
ered as a deep branching lineage of Bacteria when 
using some of the most trusted individual phylogenetic 
markers 13,25,30 . Brochier and Phillipe 31 suggest that these 
results could also be artefacts of phylogenetic recon¬ 
struction. Alternatively, some molecular improve¬ 
ments might have spread more recently within the 
Bacteria by HGT, leaving the Thermotogales with the 
more ancient and less altered versions (see discussion 
on species boundaries below). If one considers that, 
owing to recombination, slowly evolving molecules 
themselves might be mosaic 19,32-36 , then more recent 
HGT among the non-Thermotogales could explain the 
similarity between whole-genome phylogenies and sin¬ 
gle conserved molecules. Interestingly, this explanation 
rescues the extremely thermophilic Bacteria as a model 
for early Bacteria: although they no longer represent 
deep branching organismal lineages, they have per¬ 
haps not shared many of the improvements that were 
recently exchanged among the mesophilic Bacteria. 


Improving detection of HGT 

Several methods have been developed to detect hori¬ 
zontally transferred genes. One of the most popular 
methods is the detection of codon or nucleotide 
compositional bias 37-40 . This approach identifies many 
recently transferred genes 41 . However, it has not been 
unequivocally shown that HGT is the sole cause of 
unusual compositional bias. Furthermore, not all 
recently acquired genes show compositional bias 42 ; it is 
even conceivable that some of those recently acquired 
genes that increase the fitness of the recipient show a 
weaker compositional bias. In principle, examination 
of the phylogenetic conflict among loci is the most 
direct approach to screen for horizontally transferred 
genes. For example, heat shock protein homologues 
(HSP70) group the Archaea among the Bacteria 43-46 , 
and many proteins in T. maritima are most similar to 
orthologues from the Pyrococci 27 , presumably because 
some archaea acquired an HSP70 homologue gene 
from a bacterium, and the Thermotoga lineage incor¬ 
porated many genes from the Pyrococci or their rela¬ 
tives. Computer programs have been developed that 
automate the assembly of gene families and reconstruct 
their phylogeny to detect HGT in genome analyses 47-50 . 
The use of phylogenetic reconstruction promises more 
reliable detection of HGT events than simple database 
searches 51 . With more genome sequences available for 
closely related organisms, these approaches promise to 
become even more useful. 

However, there are several potential pitfalls to 
avoid in analyses of phylogenetic conflict, especially 
for events that happened in the distant past. First, 
many problems arise from the limited and often 
noise-riddled phylogenetic information that a gene 
sequence presents about long-ago periods of evolu¬ 
tionary history. Conserved sequence positions allow 
the identification of homologues, but a perfectly con¬ 
served sequence position contains no phylogenetic 
information. For example, the amino-acid sequence 
of histones or ATP-synthase catalytic subunits is 
nearly identical in closely related species, and there¬ 
fore useless in reconstructing within-genus relation¬ 
ships. In addition, sequence positions that experience 
little or no purifying selection will rapidly become 
saturated with substitutions and will not retain any 
phylogenetic information 52 . Although all gene families 
retain information for phylogenetic reconstruction at 
some phylogenetic depth, in general, this information 
is insufficient for the reconstruction of relationships 
at most other phylogenetic depths. Another problem 
is that gene duplication followed by gene loss can give 
rise to different gene trees, and therefore conflicting 
phylogenetic signals that are indistinguishable from 
those resulting from HGT (FIG. 3). Therefore, phylo¬ 
genetic incongruence is a well defined screen for HGT, 
but not unambiguous proof 53 . 

Taxonomists are frequently divided into ‘lump¬ 
ers’ and ‘splitters’ 54 . At the molecular level, the same 
tendencies give rise to those who concatenate data 
to extract even the smallest grain of phylogenetic 
information that might be distributed over many gene 
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BIPARTITION 

A bipartition corresponds to an 
internal branch in a 
phylogenetic tree. A single 
bipartition divides the data 
(sequences, genomes or species) 
into two groups, but it does not 
consider the relationships 
within each of these groups. 

LENTO PLOTS 
Phylogenetic analyses using 
bipartition data named after 
G.M. Lento. For each 
bipartition, the bipartition 
spectra give the support for the 
bipartition as a histogram bar in 
the positive direction, and the 
support for all conflicting 
bipartitions as a bar in the 
negative direction. 

GALLED TREES 
Trees that contain local 
deviations from a strictly 
furcating pattern. 


a 

Species ABC D 



b 

Species ABC D 



Figure 3 | Comparison of two explanations for 
unexpected phylogenetic distribution, a | The presence 
of a gene with characteristics that are typical for an unrelated 
group can be due to horizontal gene transfer (HGT, arrow), 
b | An alternative explanation is an ancient gene duplication 
(*) followed by differential gene loss (x). The more sister 
lineages have only the typical gene, the more independent 
gene-loss events must be postulated under this scenario. 


families, and those who direct their attention to the 
signal contained in individual gene families and who 
synthesize a consensus only after they have ascertained 
that there are compatible signals. Much progress has 
been made in confronting the pitfalls inherent to both 
approaches, and the different approaches are becoming 
more similar in their practical implementation. 

One problem with concatenation is the selection of 
data to include. This selection is complicated by the 
fact that the absence of evidence for transfer cannot 
be taken as evidence for the absence of transfer. If one 
applies a stringent measure for conflict, nearly all genes 
agree with the consensus signal within the limits of 
confidence. The amount of conflict detected depends 
on the chosen limits of confidence and on the extent 
of taxon sampling 55-58 . Tests of compatibility between 
different trees and the datasets from which these trees 
were derived 59,60 have become the preferred tool to 
assess the potential conflict between individual gene 
families 61 , but more sophisticated methods that take a 
larger number of possible trees into consideration are 
being developed 62,63 . 


Fractionating phylogenetic information into 
smaller sub-analyses (down to the level of quartets) 
can generate artefacts owing to poor taxon sampling 
(reviewed in ref. 64). However, evaluation of phyto¬ 
genies embedded in larger trees effectively addresses 
this problem 53 . An advantage of analyses that focus 
on small quanta of phylogenetic information is that 
plurality consensus signals can be extracted, even 
though not a single gene family might be in perfect 
agreement with any other gene phylogeny (supertree 
approaches). Furthermore, gene families that retain 
conflicting phylogenetic information can be iden¬ 
tified readily, even though these genes might not 
allow a perfect reconstruction of the gene family’s 
phylogeny at all levels of relationship. For example, 
spectral analyses of bipartition data, so-called lento 
plots 65 , can be applied to all gene families present 
in a selection of genomes 22 . These Lento plots are 
histograms that depict support and conflict for the 
different bipartitions. As bipartitions can readily be 
separated into those that are compatible (those that 
can coexist in a tree) and those that are incompatible 
(bipartitions that cannot coexist in the same tree), 
these spectra readily identify gene families with con¬ 
flicting phylogenetic information. The screened and 
filtered phylogenetic information can be synthesized 
into consensus trees, as is the case in the many super¬ 
tree approaches that are being developed 66 . However, 
if organismal phylogeny is embedded in a web of 
gene phylogenies that is woven through many gene 
transfers and other reticulation events, analyses that 
directly reconstruct a network, rather than a single 
tree, seem appropriate. 

Networks that contain only individual loops that 
do not share nodes with one another have become 
known as galled trees. The methodology for con¬ 
struction of these networks of non-interwoven loops 
has undergone dramatic progress 67-69 . Similar types 
of analysis that focus on conflicting information 
contained in a dataset are split-decomposition analy¬ 
ses 70-72 . These analyses provide visually compelling 
illustrations of phylogenetic ambiguity or conflict 
contained in data 73 , but the loops in these networks 
cannot readily be interpreted as depicting evolution¬ 
ary histories. A related approach, recently described 74 
and applied to the origin of eukaryotes 75 , reaffirmed 
that eukaryotes contain both archaeal and bacterial 
genes, and suggested an overall ring structure for 
organismal phylogeny. Were the bacterial genes 
found in eukaryotes brought into the eukaryotic cell 
concomitantly through a small number of fusion and 
endosymbiotic events? Were they brought in through 
a steady trickle of smaller HGT events? Or were 
the bacterial genes present in the most recent com¬ 
mon ancestor of all living organisms and lost in the 
archaeal lineage? And in any case, are HGT recipients 
receiving genes mostly from a few donors, or from 
many? At present, these alternatives remain under 
vigorous debate 76-81 (FIG. l). Quantitative models of 
these possibilities would allow testing and evaluation 
of their likelihood. 
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SELFISH OPERON THEORY 
Explains the formation of 
operons through gene transfer. 
According to this theory, genes 
encoding parts of the same 
process become clustered 
because functionally unrelated 
intervening genes become 
useless and are deleted 
following a transfer, and 
because such clustered genes are 
more likely to be successfully 
transferred as a unit compared 
to genes encoded in distant 
parts of the genome. 

ORFANS 

Open reading frames that do 
not have a recognizable 
homologue among known 
sequences. 

RIBOTYPE 

In analogy to genotype and 
phenotype, the type of RNA in 
an organism, usually referring 
to the type of ribosomal RNA. 

SELECTIVE SWEEP 
Fixation of an advantageous 
character in a population. In the 
absence of recombination, the 
advantageous character carries 
with it the whole chromosome 
and erases diversity within the 
population. 

MORONS 

Genetic elements in lambdoid 
phages acquired through recent 
gene transfer. These genes are 
unrelated in function to the 
genes surrounding them. They 
were named morons because 
their addition to the phage 
genome means that there is 
‘more DNA 1 than there is 
without the element. 


Deleterious, beneficial or neutral? 

Horizontal transfer of functional units can provide 
the recipient with the tools necessary to occupy new 
ecological niches. The selective advantage conferred by 
transferred genes is fundamental to the selfish operon 
theory 82 . This theory is most clearly manifested in 
pathogenicity and ecological islands, which are con¬ 
tiguous sets of genes acquired through HGT that form 
genomic islands of atypical composition. Such sets of 
genes encode functions conferring traits that allow the 
colonization of new ecological niches 83 . The discovery 
of selectable and easily transferred genomic islands gave 
rise to the expectation that, for genes to be horizontally 
transferred and successfully integrated into the recipi¬ 
ent genome, the transferred genes would need to pro¬ 
vide a selective advantage either to the recipient or (in 
the case of parasitic genetic elements) to themselves 84 . 
Understanding the frequency with which horizontally 
transferred genes confer a benefit, are neutral or nearly 
so, or are deleterious is of great importance 85 , both for 
our understanding of the impact of HGT on the evolu¬ 
tion of microorganisms 86 and for the practical purpose 
of understanding the potential spread of transgenes to 
natural microbial populations 87 . 

Comparison of genomes from strains of E. coli has 
revealed that the core of genes present in all strains 
of this microbial species is surprisingly small 15 . Many 
of the recently transferred genes are not present in 
sister taxa 41 . These ‘orfans’ are on average shorter than 
other genes, contain a higher percentage of A and T 
nucleotides, and have a codon usage that is similar to 
that found in phage and plasmids 41,88 ' 89 . Intriguingly, 
the average codon usage in the recently transferred 
genes is even more extreme than the average calcu¬ 
lated from phage genes 41 . If the atypical composi¬ 
tion were to reflect the genomic bias of the previous 
bacterial host, then atypical genes with a higher GC 
content should frequently be found, but this is not 
the case. The recently acquired genes have a higher 
AT content than the typical chromosomal genes, and 
this seems to be true even for Bacteria with a high 
AT content 41 . These findings indicate the presence of 
a Vapour’ of transient genes that surrounds a stable 
set of core genes 90 . The genes in the vapour cloud 
sometimes reside within the bacterial chromosomes, 
but perhaps more frequently reside in phage and 
extrachromosomal genetic elements. Daubin et al. 
calculated that these genes have a high turnover rate 
in the genome 41 . 

In those instances where acquired genes were 
present in two or more closely related E. coli and 
Salmonella enterica genomes, the ratio of non- 
synonymous (KJ to synonymous (JC) substitutions 
indicated that most of these genes were under puri¬ 
fying selection; that is, nucleotide substitutions that 
change the encoded amino acid occur at a lower rate 
than substitutions that leave the encoded amino acid 
unchanged by virtue of the redundancy of the genetic 
code. However, the K IK ratio for these transferred 

a s 

genes, while indicative of purifying selection, is 
higher than for other E. coli genes. For transferred 


genes present in both E. coli and S. typhimurium , the 
KJK ratio was calculated as 0.19, whereas the genes 
classified as ‘native’ had a fC IK ratio of 0.05 (ref. 89). 

a s 

So, although these apparently transient genes are 
under purifying selection, this selection is weak. In 
part, the weak selection might be the consequence 
of selection against novel deleterious function (for 
example, protein ‘toxicity’ 91 ) instead of a need to 
retain a selective advantage that these genes provide 
to their host. 

The notion that many of these non-core genes might 
be selectively neutral or nearly neutral is also sug¬ 
gested by recent studies of a marine bacterioplankton 
population of Vibrio splendidus by Thompson et al. 92 
Even though the analysed bacteria all fall into a tight 
ribotype cluster with less than 1% sequence divergence 
in the 16S rRNA gene, the diversity at the genome 
level is astounding: among the 206 strains tested, 
180 unique genotypes were determined by pulse-field 
gel electrophoresis. Individual genotypes are present at 
low concentration: Thompson et al. estimate that the 
population (defined as the ribotype cluster) contains 
>1,000 unique genotypes. Twelve strains that were 
analysed in more detail differ in genome size between 
4.5 and 5.6 Mb. Apparently, none of the detected 
variations provided sufficient selective advantage to 
initiate a selective sweep (also known as a periodic 
selection event) 93 . Either the ribotype cluster consists 
of many distinct subpopulations that each occupy a 
distinct ecological niche, or the astounding genomic 
variability found in this study is selectively neutral or 
nearly neutral. 

The following picture is emerging: a large amount 
of gene swapping and gene exchange occurs between 
chromosomal and non-chromosomal genes. Most 
of these transfers are nearly neutral to the recipient, 
some might increase the fitness of phage and viruses 
(morons 41,94 ) under some conditions. Within the large 
pool of recently transferred genes, there are a few genes 
that increase the fitness of the recipient. These rare 
transfers can become fixed owing to a selective sweep, 
and it is only these latter transfers that are usually 
detected using comparative molecular phylogenies. 

Population genetics for prokaryotes 

The biological species concept 95 defines a species as 
a potentially interbreeding group of organisms that 
are capable of producing fertile offspring. Within 
such groups, gene phylogenies are seldom congru¬ 
ent, owing to high rates of gene flow and recombina¬ 
tion 19,38 . Which phenomena generate cohesion within 
a prokaryotic species? Two different, but not mutually 
exclusive, mechanisms have been suggested. First, high 
levels of gene transfer followed by homologous recom¬ 
bination could play the part that sexual reproduction 
plays for gene flow in multicellular eukaryotes 38 . In this 
case, cohesion would be maintained by high levels of 
genetic exchange. Or, cohesion could also be generated 
through selective sweeps that occur if a gene that pro¬ 
vides a selective advantage to its carrier arises through 
mutation or gene transfer 96 . 
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ILLEGITIMATE 
RECOMBINATION 
Recombination between two 
non-homologous DNA 
segments. Usually, illegitimate 
recombination is fairly 
infrequent. 

LEADING AND LAGGING DNA 
STRAND 

DNA replication on the leading 
strand occurs continuously in a 
5' to 3' direction, whereas DNA 
replication on the lagging 
strand occurs discontinuously 
through the synthesis of short 
Okazaki fragments. 


The theory for prokaryotic (and perhaps microbial) 
evolution needs further work: a population genetics 
and molecular evolution theory for organisms that 
have no traditional species boundaries, but share 
genes across considerable evolutionary distance 87 . 
Clearly, prokaryotes, and possibly single-celled 
eukaryotes, are under different selective pressures 
with regard to DNA exchange than most multi¬ 
cellular organisms. Unlike animals, for instance, 
prokaryotes and single-celled eukaryotes experience 
recombination less frequently than they reproduce, 
and the quantity of DNA exchanged is small 96,97 . 
Consequently, there is little selective advantage in 
preventing such rare interspecific exchange: the selec¬ 
tion coefficient, even if all transfers were completely 
lethal, could be no larger than the rate of recombina¬ 
tion. With potential rates of recombination as low as 
10“ 7 to 10" 10 per generation, and selection coefficients 
against recombination itself constrained to be this low 
(if always lethal) or lower (if frequently neutral), it 
seems reasonable to assume that only the weakest 
of selection has operated on uptake mechanisms as 
a direct consequence of the horizontal transfer of 
deleterious DNA fragments. 

The main difference between prokaryotes and 
multicellular eukaryotes is that the species boundaries 
for prokaryotes are ‘fuzzy’ 38,98 . Homologous recom¬ 
bination is not limited to genes exchanged within a 
species, and illegitimate recombination can incorporate 
genes from divergent donors. If a novel gene arises 
that provides a selective advantage, this invention 
can be shared between unrelated organisms through 
HGT. Recombination rates subsequent to interspecies 
transfer might need to be high in some diverse recipi¬ 
ent species, because otherwise the observed recipient 
species diversity is inexplicable, as it would have been 
wiped out by selective sweeps in which the advanta¬ 
geous gene carries with it the complete genome 93,99 . 
The group within which innovations can be exchanged 
will be different for different genes: some genes will be 
advantageous only within the environment in which 
they originated, whereas others will provide a selec¬ 
tive advantage even if the gene is transferred across 
domain boundaries. One result of the larger exchange 
groups that are created through HGT is an acceler¬ 
ated rate of innovation. Jain et al. 23 estimate that the 
innovation rate increases 10 4 - to 10 10 -fold owing to 
HGT, and Townsend et al. 86 show that recombination 
across traditional species lines can potentially accel¬ 
erate the acquisition of adaptively important traits, 
requiring several amino acid changes by similarly 
large magnitudes. 

The evolution of the hsp70 ( dnaK) gene provides 
an example of an innovation that is apparently 
spreading to divergent organisms, and it illustrates 
that gene histories can be different from the history 
of organismal evolution. Homologues of hsp70 are 
found in members of all three domains of life 44 , but 
on closer inspection, there is no other evidence to 
support the notion that hsp70 was present in the 
most recent common ancestor of all organisms. In 


molecular phylogenies, the archaeal homologues are 
interspersed within the Bacteria, and many Archaea, 
including the Crenarchaeota, do not encode an hsp70 
orthologue in their genome. It therefore seems likely 
that this gene was absent in the archaeal ancestor, 
and was only acquired more recently by some of the 
Archaea by HGT 43,45,46 . The most recent common 
ancestor of all present-day hsp70 genes seems to have 
existed more recently than the most recent common 
ancestor of all organisms. 

The conclusion that different genes coalesce to 
different molecular ancestors and that these molecu¬ 
lar ancestors existed in different organismal lineages 
and at different times is not limited to hsp70, but is 
possibly true for all gene families 17 . Several studies 
have determined the frequency with which genes 
belonging to different functional categories are 
being transferred 21,27,100 , frequently using the COG 101 
(clusters of orthologous groups) classification, and 
some environmental and genome characteristics 
were studied with respect to the influence they have 
on gene-transfer frequency 23 . To more realistically 
reconstruct the interplay between HGT and verti¬ 
cal inheritance, more detailed quantitative studies 
are needed to determine the factors that govern 
HGT frequency. 

Gogarten et al. 19 point out that the frequency of 
successful exchange between taxa will depend on five 
factors: propinquity, metabolic compatibility, adapta¬ 
tions to their abiotic environment, gene expression 
systems and gene-transfer mechanisms. At present, 
most of the evidence regarding the relative impor¬ 
tance of these factors is anecdotal, and the only 
systematic comparative study 23 had limited power, 
owing to sparse taxon sampling. All of these fac¬ 
tors correlate with genetic relatedness and therefore 
DNA-sequence divergence. For instance, in the con¬ 
text of homology-assisted heterologous recombina¬ 
tion, there is a well characterized quantitative effect 
that greater DNA-sequence divergence results in 
lower homologous recombination rates in E. coli 102 . 
Bacillus subtilis 103,104 and Streptococcus pneumoniae 105 . 
Lawrence and Hendrickson 106 characterized short 
oligonucleotide sequences with asymmetric distri¬ 
bution on the LEADING AND LAGGING DNA STRAND. These 
sequences might have a role in genome replication, 
and they are conserved only between closely related 
organisms, whereas in distantly related organisms 
the same motif occurs abundantly on either DNA 
strand. A sequence from a phylogenetically distant 
donor that contains a recipient’s regulatory motif on 
both strands might incur a selective disadvantage, 
effectively biasing successful transfers towards more 
closely related organisms. These observations could 
be incorporated into a theoretical framework for 
the evolution of microorganisms that incorporates 
HGT and relies on DNA-sequence divergence as a 
quantitative barrier instead of species designation 
as a qualitative barrier to recombination between 
microorganisms 86 . Such a framework requires quan¬ 
titative characterization of the environmental density 
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Figure 4 | Graphical schema for the quantification and modelling of recombination with divergent DNA and 
horizontal gene transfer. To develop a 'population genetics’ theory for microorganisms that take up divergent DNA, 
sequence divergence can be considered as a mathematically continuous measure of the species barrier, a | Quantification 
of the environmental distribution of divergent DNA (homogeneous DNA in blue, heterogeneous DNA in red), b 
Quantification of the probability of integration of divergent DNA into the genome, c | Quantification of the probable selective 
effects of that integration. These three graphs illustrate several hypotheses regarding donor DNA sequence of particular 
divergences. First, that a sequence with neutral or nearly neutral effects is particularly likely to come from donors that are 
not highly divergent. Second, that the deleteriousness of incorporated DNA, by contrast, is increasingly likely with 
divergence. Lastly, that most (rare) beneficial incorporations of DNA will come from some intermediate level of divergence. 
Because each of these distributions (a-c) have the same x axis (sequence divergence), they can be visually or analytically 
multiplied (axbxc) to form an evolutionary model, leading to d | quantitative predictions relating to the effects of horizontal 
gene transfer on microbial evolution. 


of divergent DNA, the probability of chromosomal 
integration in the organisms of interest and the 
selective value of novel microbial traits (fig. 4; see 
also the article by C.M. Thomas & K.M. Nielsen in 
this issue for a more mechanistic discussion of these 
factors). All of these quantitative characterizations 
are vital to the further understanding and devel¬ 
opment of a comprehensive ‘population genetics’ 
theory for microbial communities. Such a formal¬ 
ism should encompass gene flow among divergent 
microorganisms. Recent studies of population- 
level sequence variation in the archaea Sulfolobus 
islandicus (R.J. Whitaker, D. Grogan & J.W. Taylor, 
unpublished results) and Halorubrum sp. 107 provide 
evidence of extensive intra- and interpopulation 
recombination. Quantitative models of HGT need to 
be constructed and tested against the actual sequence 
variation observed in such microbial populations. 


In addition to surveys of extant variation, con¬ 
struction of useful models of the effect of gene trans¬ 
fer on evolving populations requires experimental 
work on the ecological effects and selection coefficients 
associated with non-core genes. Ultimately, not only 
will the interplay between new survey data, experi¬ 
mental data and models of gene transfer serve to 
elucidate the dynamics of gene transfer, it will also 
clarify the degree to which gene transfer has 
impacted the evolution of microorganisms and 
will help refine methods for the detection of hori¬ 
zontally transferred genes. Such refinements will 
help to resolve the ambiguity between HGT and 
shared ancestry as causes for the patterns that we 
describe as shared ancestry with phylogenetic trees. 
Models of HGT should be developed and tested 
against growing datasets to distinguish between 
these alternatives. 
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